CHL5230H-Appplied Machine Learning for Health Data

Instructor: Zahra Shakeri– Fall 2023
Dalla Lana School of Public Health-University of Toronto
Datathon #5


Datathon Description and Instructions

Datathon Context and Objective

Mobile health (mHealth) has emerged as a transformative solution for public health, leveraging the widespread use of mobile devices to improve healthcare outcomes. mHealth applications range from telemedicine services, which allow for remote diagnosis and treatment, to health monitoring apps that encourage proactive health management. The importance of mHealth in public health is multifaceted:

  1. Accessibility: mHealth technologies make healthcare services more accessible, especially in remote or underserved areas, by bridging geographical barriers.
  2. Preventive Care: With real-time data collection, mHealth enables early detection of health issues, promoting preventive care over reactive treatment.
  3. Patient Engagement: mHealth tools empower patients to take charge of their health through fitness trackers, medication reminders, and personalized health information.
  4. Disease Surveillance: mHealth is pivotal in disease surveillance and management, facilitating the tracking of disease outbreaks and the monitoring of public health interventions.
  5. Cost-Effectiveness: By reducing the need for in-person visits and enabling efficient resource allocation, mHealth can lead to significant cost savings for both healthcare providers and patients.

The integration of mHealth within Canada’s healthcare landscape is indicative of a broader global shift towards technologically enhanced healthcare delivery. With internet penetration at 93.8 percent and cellular connections accounting for 98.5 percent of the population as of early 2023, Canada showcases a high degree of digital adoption, a prerequisite for mHealth proliferation. mHealth’s potential to improve healthcare outcomes is significant, providing a platform for mobile devices, including smartwatches and smartphones, to facilitate health data collection and analysis. Machine Learning augments mHealth’s capabilities by analyzing this data, offering personalized health insights and predictive analytics, thus transforming patient care and health management. Together, mHealth and ML are redefining patient engagement, optimizing healthcare delivery, and paving the way for a future where healthcare is more accessible, efficient, and tailored to individual needs.

Datasets Information

The dataset for this Datathon is a collection of multi-dimensional time-series data acquired from wearable sensors placed on subjects’ left ankles and right lower arms. It encapsulates raw sensor outputs such as acceleration and gyroscopic measurements across three axes for each location. These sensor readings are annotated with activity labels ranging from sedentary behaviors to dynamic movements, providing a rich context for understanding human motion.

Health data science, particularly when fused with Machine Learning, plays a pivotal role in transforming raw health data into actionable insights. By applying ML algorithms to such datasets, we can develop models capable of recognizing and classifying human activities. This has profound implications for healthcare, offering potential advancements in areas such as remote patient monitoring, rehabilitation, personalized fitness programs, and early detection of health conditions through movement patterns. The dataset provides a foundation to train and validate such models, embodying the importance of precise and comprehensive data in developing robust ML applications for health.

Column Description
alx Acceleration from the left-ankle sensor (X-axis)
aly Acceleration from the left-ankle sensor (Y-axis)
alz Acceleration from the left-ankle sensor (Z-axis)
glx Gyro from the left-ankle sensor (X-axis)
gly Gyro from the left-ankle sensor (Y-axis)
glz Gyro from the left-ankle sensor (Z-axis)
arx Acceleration from the right-lower-arm sensor (X-axis)
ary Acceleration from the right-lower-arm sensor (Y-axis)
arz Acceleration from the right-lower-arm sensor (Z-axis)
grx Gyro from the right-lower-arm sensor (X-axis)
gry Gyro from the right-lower-arm sensor (Y-axis)
grz Gyro from the right-lower-arm sensor (Z-axis)
subject Volunteer number
Activity Corresponding activity (0: ‘Nothing’, 1: ‘Standing still (1 min)’, 2: ‘Sitting and relaxing (1 min)’, 3: ‘Lying down (1 min)’, 4: ‘Walking (1 min)’, 5: ‘Climbing stairs (1 min)’, 6: ‘Waist bends forward (x20)’, 7: ‘Frontal elevation of arms (x20)’, 8: ‘Knees bending (crouching) ($$20)’, 9: ‘Cycling (1 min)’, 10: ‘Jogging (1 min)’, 11: ‘Running (1 min)’, 12: ‘Jump front & back (x20)’). Note: The number in brackets indicates the number of repetitions (xN) or the duration of the exercise (min).

The dataset can be found at Modules/Datathon #5, and they will be provided at 6:45 pm on Tuesday, November 14, 2023 .

Instructions for Submission

You are encouraged to discuss your work with your teammates and other teams and can use online and offline resources. However, all members of your team should make substantial, meaningful contributions to your submission, ensuring fairness to all participating teams in this datathon. Teams must submit the following materials by the 8:00 PM in-class deadline and the final deadline at 2:00 PM. It is advisable for teams to work consistently from the outset on deliverables rather than attempting to complete them all within the last hour. You should begin work on the deliverables at least three days before the deadline.

Components of Submission

1. Low-fidelity Prototype (In-class Submission)

The first phase of this Datathon involves collaborative efforts among students, aimed at transforming the provided datasets into actionable insights. Teams should formulate research questions and outline their data analysis plans, followed by submitting a low-fidelity prototype of their solution to Assignments/Datathon#5/Low-fidelity Prototype. Please adhere to the naming convention outlined later in this document when naming your one-page PDF submission for today.

Every team is required to submit their low-fidelity prototype through Quercus by 8:00 PM on November 14th, 2023. A successful submission should include a clear and legible list of research questions that you plan to address using the provided datasets. Additionally, provide a detailed plan specifying the analysis methods (e.g. machine learning) you intend to employ for addressing these questions. Ensure that each research question corresponds to its respective analysis plan.

Please note that you are not obligated to finalize your solution or research questions at this stage. If you come up with a better idea during the week, feel free to update your plan. The primary goal of the low-fidelity milestone is to initiate the brainstorming phase of a data science project, which is typically the initial and most critical phase. It allows you to see how the project’s direction may evolve during your analysis.

2. A High-fidelity Prototype

All teams are expected to submit their analysis results and deliver brief presentations (2 minutes for the presentation, followed by 1 minute for questions) consisting of a minimum of 2 and a maximum of 3 slides. The purpose of these presentations is to guide your instructor and TA(s) on how you leveraged the available data to address the research question you formulated.

During your presentation, cover essential elements, including meaningful results, the data analysis process, challenges encountered, and key findings. While you have the flexibility to decide the presentation’s content, it should focus on conveying a clear understanding of the analytical process, findings, and conclusions. In essence, the presentation should provide a condensed version of the written report.

To allow the TA to prepare teams’ presentations effectively, it is imperative that teams finalize their submissions by 2:00 PM on November 14, 2023.

3. A Written Report

Teams are required to compile a report that details the steps taken to address their proposed question or prompt. While there is not a prescribed format for the report, it should encompass key sections such as:

  • Introduction: Explain the questions you aimed to answer with the data and their significance.
  • Data Engineering Process: Describe how you cleaned and prepared the data and specify the datasets used.
  • Analysis: Outline the learning and analysis techniques employed, along with the rationale behind their selection.
  • Findings: Present your discoveries and insights.
  • Conclusion: Summarize what health practitioners can infer from your team’s work.
  • Individual Contributions: Highlight the contributions of each team member throughout the entire process.
  • Code and Presentation: Host your Datathon materials, including notebooks and datasets, on GitHub. Share the GitHub project link in the report for easy access by the TA. Also, utilize Google Presentation to host your presentation and provide the public link in the report.

Note: When submitting your report to Quercus, please consolidate all components into one PDF file and include links to other relevant elements within the report. Name your file following the format: Team Number-CHL5230-F23 (e.g., 25-CHL5230-F23.PDF). Submissions not adhering to this naming convention will not be considered for grading. Additionally, ensure that you include your team number and the names of all team members in your report.

At a minimum, the report should cover the question addressed, findings, the data analysis process, and a conclusion. The report must not exceed two pages in length. While the code should be functional and produce the reported results, it will not be evaluated based on code quality.

Ensure that all materials are submitted by 2:00 pm, November 21rd. Unfortunately, no late submissions will be accepted.

This Datathon is pretty free-form! This is intentional; projects you work on in industry will rarely be very specific. Please feel free to show early results to me to get some feedback you can use to ensure a successful submission!

Important Dates

Component Due Time Where to Submit?
Data Availability November 14, 6:45 pm Modules/Datathon #5
Low-fidelity Prototype November 14, 8:00 pm Assignments/Datathon #5/Low-fidelity Prototype
Written Report November 21, 2:00 pm Assignments/Datathon #5/Written Report